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ABSTRACT 

A vast amount of research on the regulation of gene 
expression has relied on plasmid reporter assays. In 
this study, we show that plasmids widely used for this 
purpose constitutively produce substantial amounts 
of RNA from a TATA-containing cryptic promoter 
within the origin of replication. Readthrough of 
these RNAs into the intended transcriptional unit 
potently stimulated reporter activity when the 
inserted test sequence contained a 3' splice site 
(ss). We show that two human sequences, originally 
reported to be internal ribosome entry sites and later 
to instead be promoters, mimic both types of element 
in dicistronic reporter assays by causing these 
cryptic readthrough transcripts to splice in patterns 
that allow efficient translation of the downstream 
cistron. Introduction of test sequences containing 
3' ss into monocistronic luciferase reporter vectors 
widely used in the study of transcriptional regulation 
also created the false appearance of promoter 
function via the same mechanism. Across a large 
number of variants of these plasmids, we found a 
very highly significant correlation between reporter 
activity and levels of such spliced readthrough tran- 
scripts. Computational estimation of the frequency 
of cryptic 3' ss in genomic sequences suggests that 
misattribution of c/s-regulatory function may be a 
common occurrence. 

INTRODUCTION 

Plasmid-based reporter assays were first used in animal 
cells nearly 30 years ago (1) and remain today the most 
common method of functional testing of candidate 
as-regulatory sequences. These assays have perhaps 
found the most widespread application in the study of 
transcriptional control. Although newer approaches for 



identifying regulatory elements, such as those based on 
chromatin immunoprecipitation, have yielded an abun- 
dance of information on protein-DNA interactions, 
plasmid reporter assays remain the favored method for 
assessing the functional activity in candidate regulatory 
sequences (2,3). The study of internal ribosome entry 
site (IRES) elements has also heavily depended on 
plasmid reporter assays, and the standard test for these 
elements is the dicistronic assay (4,5). 

Although the integrity and structure of the transcripts 
generated in reporter assays for ris-regulatory function is 
critical for the validity of these tests, RNA structural 
analysis is only rarely performed. Mechanisms such as 
cryptic initiation or aberrant RNA processing may result 
in unanticipated species that produce anomalous reporter 
activity and fundamentally alter the outcome of these 
experiments. As an example, we and others have shown 
previously that splicing events involving 3' splice sites (ss) 
in test sequences can cause false positive scores for IRES 
function in reporter assays (6-8). 

An overwhelming majority of reporter plasmids, and 
molecular cloning vectors in general, have backbone 
sequences originally derived from the prototype 
Escherichia coli cloning vehicle pBR322 (9) and contain 
the origin of replication (ori) of pMBl (10) or, more 
commonly, a point mutant of this ori from the pUC 
series of plasmids (11,12). It has been recognized that 
plasmid backbone sequences can have some effect on 
eukaryotic expression cassettes contained within the 
same plasmid and may consequently influence the results 
of reporter assays. Accordingly, as a safeguard to prevent 
aberrant RNAs from reading into the intended transcrip- 
tional unit, many reporter plasmids now contain an 
upstream polyadenylation signal. Such signals are 
believed to reliably prevent the occurrence of readthrough 
transcripts that could mediate spurious reporter gene 
expression (2). 

Here, we demonstrate that, despite such safeguards, a 
cryptic promoter in the pMBl ori gives rise to transcripts 
that can readily create the artifactual appearance of 
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transcriptional or translational regulatory function in 
reporter assays. Evaluating multiple widely used 
plasmids, including some that have been employed in 
thousands of published studies, we show that splicing 
between sites in the pMBl -driven transcripts and within 
inserted test sequences can cause robust expression of the 
downstream reporter gene whether or not the insert 
possesses genuine transcriptional or translational stimula- 
tory activity. Additionally, we provide new evidence that 
the c-myc 5' untranslated region (UTR) itself harbors a 
cryptic core promoter responsible for its apparent IRES 
function in plasmid tests. 

MATERIALS AND METHODS 

Plasmids 

Plasmid pRF and promoterless variants containing the 
eIF4G, XIAP and c-myc inserts have been described pre- 
viously (13-15). Jian-Ting Zhang (Indiana University) 
provided the pRF plasmids containing the eIF4G 
sequence. The pRF plasmids containing the XIAP and 
c-myc sequences were provided by Greg Goodall and 
Andrew Bert (University of Adelaide). The pPgal/CAT 
plasmids (16,17) were provided by James Smiley and 
Holly Saffran (University of Alberta). To construct pRF 
variants containing a non-specific spacer or fragments of 
the rabbit P-globin (RBG) or adenovirus E2A genes 
harboring 3' ss, we annealed synthetic oligonucleotides 
and inserted them at the Spel and Ncol sites of pRF 
containing or lacking the SV40 promoter (SV40P). The 
oligonucleotides were as follows: spacer, 5'-CTAGTTCG 
ACTGGACACTGGATCTACTACATCGATTGCTGA 
ACGGC-3' and 5'-CATGGCCGTTCAGCAATCGATG 
TAGTAGATCCAGTGTCCAGTCGAA-3'; P-globin, 5'- 
CTAGTGCTAACCATGTTCATGCCTTCTTCTTTT 
TCCTACAGGGCAC-3' and 5'-CATGGTGCCCTGT 
AGGAAAAAGAAGAAGGCATGAACATGGTTAG 
CA-3'; E2A, 5'-CTAGTACTGACTCCATGATCTTTT 
TCTGCCTATAGGACAC-3' and 5'-CATGGTGTCC 
TATAGGCAGAAAAAGATCATGGAGTCAGTA-3'. 
To generate pRF containing no insert, we removed the 
insert from pR-eIF4G-F with Spel and Ncol, blunted 
the ends with T4 DNA polymerase and ligated the result- 
ing fragment. The pRF plasmids containing the eIF4G 
and XIAP sequences with mutated polypyrimidine tracts 
(PPTs) were constructed by PCR-based mutagenesis. The 
five mutations introduced were selected based upon their 
being the most infrequent bases in the PPTs of naturally 
occurring 3' ss (18). 

pGL3-Control, pGL3-Promoter, pGL3-Enhancer, 
pGL3-Basic, pGL4.17 and pGL4.10 were from Promega. 
pGL3-Enhancer-RBG, a variant of pGL3-Enhancer 
containing the P-globin 3' ss, was derived by cutting 
promoterless pRF containing the 3' ss with Spel and NotI, 
blunting the ends with T4 DNA polymerase and 
recircularizing the plasmid by ligation. The Mlul-Ncol 
fragment of pGL3-Enhancer-RBG including the P-globin 
3' ss was excised and inserted into the same sites in 
pGL3-Basic to generate pGL3-Basic-RBG. The Sacl-Ncol 
fragment of pGL3-Enhancer containing the 3' ss was 



transferred to pGL4.17 and pGL4.10 to generate the 
p-globin 3' ss-containing variants of these plasmids. 

Cell culture and transfection 

HeLa cells were used throughout and were grown at 37°C 
under 5% C0 2 in Dulbecco's modified Eagle's medium sup- 
plemented with 10% fetal bovine serum, 100 units/ml peni- 
cillin and 100 (.Lg/ml streptomycin. Transfections were 
performed with Fugene 6 (Roche) according to the manu- 
facturer's recommendations. For reporter assays, 700 ng of 
luciferase plasmid was cotransfected with 300 ng of green 
fluorescent protein (GFP) expression plasmid pGFPemd- 
cmv[R]-control (Packard Biosciences) and transfection effi- 
ciency was assessed by flow cytometric analysis of cells with 
a Beckman Coulter EPICS XL cytometer to determine the 
percentage of GFP-positive cells. For RNA isolation, 1 ug 
of each luciferase plasmid alone was used. 

Luciferase assays 

For dual measurement of both RLuc and FLuc activities 
in transfected cells, the Dual-Luciferase Reporter Assay 
System (Promega) was used and cell lysates were prepared 
with Passive Lysis Buffer. For measurement of FLuc 
activities alone, the Luciferase Assay System with 
Reporter Lysis Buffer was used. Luciferase activities in 
20 -ul samples of each lysate were determined according to 
the manufacturer's standard protocol with a Sirius 
luminometer (Berthold). For relative light unit (RLU) 
calculations, background autoluminescence values, corres- 
ponding to luminescence measured with lysates of mock- 
transfected cells, were subtracted from each measured 
value. 

RNA isolation and reverse transcription for RT-PCR 

The RNeasy Mini kit with optional DNase I treatment 
(Qiagen) was used for isolation of RNA from transfected 
cells. Qiashredder columns were used to homogenize the 
cells prior to RNA purification. Two micrograms of each 
RNA was reverse transcribed with Superscript III reverse 
transcriptase and oligo(dT) primer (Invitrogen). The 
resulting cDNA was treated with RNase H (Invitrogen). 

5' rapid amplification of cDNA ends 

5' RACE was performed essentially according to the method 
of Matz et al. (19) on RNA isolated as described above. 
First-strand cDNA was synthesized with Superscript II 
reverse transcriptase (Invitrogen) and primer 5' jjjjjjj 
TTTTTTTTTTTTTTTTTTVN-3' . The oligo 5' CAGATGG 
ACGACTTGCGATAGACACGGG-3' was included in the 
reaction to incorporate a primer-binding site at the 5' end of 
the cDNA through the template-switching effect. The cDNA 
was then treated with RNase H and used as template in PCR 
with Phusion polymerase (Finnzymes) and forward primers 
5' ATAGAGCAGTAGTGACTCCGAACAGATGGACG 
ACTTGCGATAGA-3' (at 0.06 uM) and 5' ATAGAGCA 
GTAGTGACTCCGAA-3' (at 0.4 uM). The reverse primer 
used (at 0.4 uM) depended on the source of the RNA being 
analysed. For pRF and pGL3 RNA, the /Mc + -specific reverse 
primer 5' GGCCTTTCTTTATGTTTTTGGCGTCTT-3' 
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was used. For p(3gal/CAT RNA, CAT-specific reverse 
primer 5' ACGGTCTGGTTATAGGTACATTGAGCAA- 
3' was used. For pGL4 RNA, /Mc2-specific reverse primer 5' 
GGTCCCGTCTTCGAGTGGGTAGAAT-3' was used. 
The resulting products were directly cloned with the Zero 
Blunt PCR Cloning Kit after cleanup with a PureLink 
PCR purification kit (both Invitrogen) and sequenced. 

Standard RT-PCR 

Amplification of reverse transcribed RNA was carried out 
with Phusion polymerase and primers from Invitrogen. 
The primer sequences are shown in Supplementary 
Figure SI. The cycling conditions for amplification of 
cryptic readthrough transcripts from pGL3 and p(3gal/ 
CAT plasmids were 32 cycles of the following: 98°C for 
7 s, 53°C for 20 s and 72°C for 12 s, followed by 5min at 
72°C. The conditions for amplification of the intended 
dicistronic transcripts from pPgal/CAT plasmids were 30 
cycles of the following: 98°C for 7 s, 56°C for 20 s and 
72°C for 90s, followed by 5min at 72°C. The reaction 
products were run on an agarose gel and the predominant 
bands were excised, column purified, cloned with the Zero 
Blunt PCR Cloning Kit and sequenced. 

Quantitative RT-PCR 

Real-time RT-PCR was performed with primers from 
Invitrogen, probes and TaqMan Gene Expression 
Master Mix from Applied Biosystems and a MyiQ2 
Real-Time Detection System from Bio-Rad. The primer 
and probe sequences are shown in Supplementary 
Figure SI. The cycling conditions were lOmin at 95°C 
followed by 40 cycles of 15 s at 95°C and 1 min at 60°C. 
(3-Actin was used as an endogenous reference to normalize 
relative levels of the RNA targets (Applied Biosystems 
human ACTB endogenous control). Standard curves for 
each primer-probe set were generated for determination of 
amplification efficiencies. The reaction efficiencies and 
threshold cycle values obtained were used to calculate 
relative levels of each target sequence with the Microsoft 
Excel-based Q-Gene application (20). All values shown 
were normalized to set the level of spliced readthrough 
cryptic RNA from pRF containing the globin insert at 
an arbitrary value of 10000. 

Statistics 

Simple linear regression and correlation analysis were 
performed on log-transformed means of RNA and FLuc 
expression levels using Prism 5 software (GraphPad). 

Determination of incidence of replication 
origins in GenBank 

To determine the number of GenBank sequences contain- 
ing the ColEl, pi 5a, pMBl/pUC or pSClOl replication 
origins, homology searches of the nr/nt nucleotide collec- 
tion were performed using the Megablast algorithm and 
the following query sequences: ColEl, CGGATTAGCAG 
AGCGATGATGGCACAAACGGTGCTACAGAGTT 
CTTGAAGTAGTGGCCCG; pi 5a, CCACTGGTAATT 
GATTTAGAGGAGTTAGTCTTGAAGTCATGCGCC 



GGTTAAGGCTAAACT; pMBl/pUC, CAGGATTAG 
CAGAGCGAGGTATGTAGGCGGTGCTACAGAGT 
TCTTGAAGTGGTGGCCTAA; and pSClOl, ACAGA 
GGGTCTAGCAGAATTTACAAGTTTTCCAGCAAA 
GGTCT AGC AGAATTTAC AG AT A . 

Estimation of incidence of cryptic 3' ss in genomic 
sequence 

To generate sequence to serve as a model of the upstream 
region of human genes, we used the random-seq program 
of regulatory sequence analysis tools (RSAT) (21), with 
the default Markov chain order value of five. 300 kb of 
random sequence was screened for potential 3' ss with the 
programs NetGene2 (22,23) (http://www.cbs.dtu.dk/ 
services/NetGene2), NNSplice (24) (http://www.fruitfly 
.org/seq_tools/splice.html), MaxEntScan (25) and 
Human Splicing Finder (HSF) (26). HSF and 
MaxEntScan analyses were both performed at http:// 
www.umd.be/HSF. The scores for the eIF4G, XIAP, 
globin and E2A 3' ss were obtained with the pRF 
plasmids containing the corresponding inserts. The 
scores for the cryptic 3' ss identified by RT-PCR in the 
pGL3 plasmids were obtained with the corresponding 
complete plasmid sequences. The estimated average 
score for aberrant splice sites in DBASS3 (27) (http:// 
www.dbass.org.uk/dbass3) was obtained by randomly se- 
lecting one of every five 3' ss in the database, determining 
their scores in the context of their native pre-mRNA, and 
averaging. 

RESULTS 

Test sequences containing 3' ss mimic both IRES elements 
and promoters in pRF 

Sequences of the human eIF4G and XIAP genes have been 
reported to possess IRES activity (16,28), based on their 
ability to stimulate expression of a downstream gene in 
dicistronic reporter assays. Subsequent studies questioned 
these conclusions and attributed the observed activities 
to cryptic promoters within the two sequences (13,14), 
based primarily on the results of reporter assays with 
promoterless variants of the widely used plasmid pRF 
(Supplementary Figure S2A) (15), a dicistronic vector 
derived from the popular pGL3 luciferase reporter 
plasmids. We noted that, interestingly, a short region 
within the eIF4G sequence found to be essential for its 
apparent cryptic promoter function precisely overlaps a 3' 
ss the sequence contains (Supplementary Figure S2B). 
Similarly, the putative IRES of the XIAP gene has also 
been reported to possess 3' ss activity (8), and whether 
this activity is inherent to the XIAP sequence or artifactual 
has been a subject of debate (17,29). We subsequently 
showed that both the eIF4G and XIAP sequences 
contain 3' ss (6) that are active in a retroviral replication 
assay and are utilized in the pre-mRNAs of the eIF4G and 
XIAP genes themselves. 

Here, we further examined the eIF4G and XIAP 
elements in the context of pRF with or without its 
promoter. As in previous studies with these constructs 
(13,14), the promoterless variants lacked both the SV40 
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promoter and the chimeric intron. In the same manner, we 
also tested the c-myc 5' UTR, which similarly was reported 
to be an IRES (15,30) and later to be a promoter (13). 
Furthermore, to determine if other sequences containing 
3' ss could produce the appearance of IRES and 
promoter function in assays with these plasmids, we 
introduced short 3' ss-containing segments from the 
P-globin and adenovirus E2A genes as test sequences. 

As previously reported, the eIF4G, c-myc and XIAP 
inserts all stimulated robust expression of FLuc from 
pRF (Figure 1). Interestingly, the 30-ss-containing globin 
and E2A fragments also stimulated FLuc expression with 
comparable potency. As expected, deletion of the promoter 
and chimeric intron vastly reduced expression of the 
upstream RLuc cistron from all constructs. By contrast, 
this deletion had almost no effect on FLuc expression 
from any of the plasmids. These findings suggested that 
3' ss can mimic both IRES and promoter elements in this 
reporter system, and in fact, from this type of assay alone, 
that it is not possible to distinguish whether a given element 
has splicing, IRES, or promoter activity. Furthermore, the 
observed persistence of FLuc expression despite deletion of 
the SV40P indicated that the test inserts that stimulated 
second-cistron expression either contain cryptic promoter 
function themselves or were functioning as 3' ss within 
cryptic transcripts initiating at unknown upstream 
locations within the plasmid. 

To assess the requirement of the 3' ss of the eIF4G and 
XIAP sequences for their putative IRES and promoter 
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function, we tested variants in which the PPTs of the 3' 
ss were mutated to contain several purines (Supple- 
mentary Figure S3). The mutant forms stimulated FLuc 
expression from both promoter-containing and pro- 
moterless pRF to levels ~2-A% of those of the wild-type 
sequences, suggesting that splicing at these sites is indeed 
required for both apparent IRES and promoter activity. 

Apparent IRES and promoter function in the eIF4G and 
XIAP sequences is mediated by cryptic readthrough 
transcripts from the plasmid ori 

To identify the initiation sites of the transcripts respon- 
sible for FLuc expression from the promoterless eIF4G, 
XIAP and c-myc constructs, we performed 5' RACE, 
sequencing 22-23 clones per construct. Remarkably, all 
but one of the transcripts from the eIF4G and XIAP 
plasmids had initiated from within the plasmid ori 
(Figure 2 A and B), and none from within the test insert 
itself. In each transcript, the previously identified 3' ss 
within the insert had spliced with a cryptic 5' ss encoded 
by the plasmid backbone, thereby positioning the FLuc 
cistron near to the 5' end. These cryptic transcripts had 
read into the reporter gene unit despite the presence of 
an upstream transcription termination element. About 
half were free of AUGs upstream of the FLuc cistron 
that might impair translation by ribosomes scanning 
from the m 7 G cap. The plasmid sequence 5' to the 
major transcription initiation site in the ori harbors 
striking and fortuitous homologies to TATA and GC 
boxes (Figure 2C) at locations relative to the initiation 
site that are typical for these elements in eukaryotic pro- 
moters. Also surprisingly, one cloned transcript from the 
eIF4G plasmid had been formed by rrara-splicing between 
the eIF4G 3' ss and a 5' ss of an endogenous cellular RNA 
(Supplementary Figure S4). 

All of the cloned transcripts from the c-myc plasmid, 
however, had initiated within the c-myc sequence itself, 
and none had undergone splicing. A number of sites 
scattered throughout the insert had been used for initi- 
ation, but about half were clustered at a location exhibit- 
ing sequence hallmarks of a core promoter, including 
initiator (Inr) and downstream promoter element (DPE) 
homologies (32) (Figure 2D), indicating that the c-myc 
sequence indeed harbors promoter activity. An earlier 
study of putative IRES function in the sequence identified 
a 50-nt segment sufficient to drive second-cistron expres- 
sion in plasmid assays (33). This short region entirely 
encompasses the identified cluster of transcription initi- 
ation events. Furthermore, detailed dissection of the 
50-nt segment in the previous study revealed two 14-nt 
sub-sequences critical for apparent IRES function, and 
these closely coincide with the aforementioned Inr and 
DPE homologies. Considering the additional previous 
finding that siRNA targeted to the RLuc cistron 
strongly knocked down RLuc but not FLuc expression 
from pRF containing the c-myc insert (13,34), a 
compelling case emerges that the mechanism by which 
this element scores positive for IRES function in tests 
with this plasmid is primarily, if not exclusively, cryptic 
transcription. 
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To determine the relative abundance of the cryptic and 
expected transcripts produced by pRF constructs, we per- 
formed quantitative RT-PCR using splice isoform-specific 
oligonucleotides. All of the tested plasmids produced 
high levels of RNA from the cryptic promoter in the ori, 
regardless of the presence or sequence of the intercistronic 
insert (Figure 3). The presence of a 3' ss-containing insert, 
however, increased the total levels of cryptic transcript by 
2- to 3-fold, and resulted in levels of spliced readthrough 
RNA that were in most cases higher than that of the 
expected dicistronic transcript generated from the SV40P. 
Moreover, the levels of cryptic spliced readthrough 
transcripts were little affected by the presence or absence 
of the SV40P. Collectively, our results indicate that these 
unappreciated readthrough RNAs account for most, if not 
all, of the FLuc expression stimulated by the eIF4G and 
XIAP sequences from both promoterless and promoter- 
containing pRF. 

Interestingly, deletion of the SV40P alone from pRF, 
while leaving the chimeric intron intact, was reported pre- 
viously to not diminish the expression of RLuc (35). Based 
on our findings, we speculate that RLuc expression from 
that variant of the plasmid may result from splicing of the 
3' ss of the chimeric intron with 5' ss of the cryptic tran- 
scripts from the ori. 

Cryptic readthrough transcripts explain apparent IRES 
and promoter function in the XIAP sequence when tested 
in a second reporter system 

As noted, the mechanism by which the XIAP sequence 
stimulates expression of a downstream reporter gene has 



been the subject of ongoing controversy (6,8,17,29,36,37). 
While the 3' ss it contains has been found to cause 
aberrant splicing of the intended transcripts of pRF and 
other multicistronic plasmids (6,8,34), this was reported 
not to be the case with the RNA produced from ppgal/ 
CAT (Supplementary Figure S5A), the vector originally 
used to identify this putative IRES (16). A more recent 
study, however, found that the XIAP element does trigger 
aberrant splicing of the dicistronic RNA of pPgal/CAT, but 
its ability to stimulate CAT expression was ultimately 
attributed to cryptic promoter activity within the element 
(17). This conclusion was based on the findings that 
siRNA targeted to (3-gal sequence retained in the mis-spliced 
dicistronic RNAs knocked down P-gal but not CAT expres- 
sion, and that the XIAP sequence stimulated CAT expres- 
sion even when the plasmid's native cytomegalovirus (CM V) 
promoter was removed. Northern blot analyses revealed 
cryptic transcripts of 1.5-2.0 kb that were produced inde- 
pendently of the CMV promoter and which were concluded 
to initiate from the XIAP insert and mediate its apparent 
IRES activity. Consistent with results of that study, our 
RT-PCR amplification of the entire region of the expected 
dicistronic p(3gal/XIAP/CAT RNA upstream of CAT 
yielded only spliced forms, none of which appeared to rep- 
resent templates suitable for efficient CAT translation 
(Supplementary Figure S5B). However, our 5' RACE 
analysis of the CAT-encoding transcripts produced by the 
promoterless form of the plasmid revealed none that had 
initiated within or near the XIAP insert, but primarily at 
the same site in the pMBl ori identified in the pRF 
plasmids (Figure 4A). As in pRF, the XIAP 3' ss had 
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Figure 3. The cryptic promoter in the pMBl ori within pRF is robust 
and constitutively active. Shown are levels of cryptic and expected 
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triggered splicing events in these transcripts that positioned 
the downstream reporter cistron close to the 5' cap. Each of 
these RNAs was completely free of upstream AUGs and 
were produced whether or not the CMV promoter was 
present (Figure 4B and C). Additionally, these RNAs are 
of approximately the same size as the cryptic transcripts 
identified previously by northern blotting, suggesting that 
they represent the same species. Together these results 
strongly indicate that spliced readthrough transcripts 
account for both apparent IRES and promoter function in 
the XIAP sequence when tested in the pPgal/CAT system. 

Splicing of cryptic readthrough transcripts can mimic 
promoter function in the pGL3 system 

To determine if transcripts from the pMBl ori could in- 
fluence the outcome of reporter assays with more conven- 
tional, monocistronic vectors, we introduced the globin 
test sequence into the multiple cloning site (MCS) of the 
promoterless plasmids of the popular pGL3 system 
(Supplementary Figure S6A). 5' RACE analysis of the 
FLuc-encoding RNA produced by pGL3-Enhancer con- 
taining this insert revealed an array of cryptic readthrough 
transcripts similar to those produced from promoterless 
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pRF and p(3gal/CAT plasmids harboring the eIF4G and 
XIAP inserts (Figure 5A). 

Insertion of the globin sequence into pGL3-Basic and 
pGL3-Enhancer increased FLuc activity by 10- and 
1 3-fold, respectively (Figure 5B). While the insert-containing 
variants of pGL3-Enhancer and promoterless pRF 
generated roughly similar levels of FLuc activity (12 000 
versus 4700 RLU), as might be expected from their structural 
similarity and the assortment of spliced readthrough 
transcripts they both produce, the fold-stimulation by the 
5' ss was much lower for pGL3-Enhancer. This was due 
primarily to a nearly 50-fold higher baseline FLuc expression 
from this plasmid. To determine if this high background 
might at least in part result from constitutive production of 
readthrough transcripts spliced at unknown cryptic 3' ss, we 
performed RT-PCR analysis on cells transfected with the 
four parental plasmids of the pGL3 system. Such transcripts 
were indeed produced by all four plasmids, and in each of the 
major amplified isoforms, a cryptic 3' ss upstream of the 
FLuc gene had been activated (Figure 5C and D). In 
pGL3-Basic and pGL3-Enhancer, this cryptic 3' ss was 
within the MCS and in pGL3-Promoter and pGL3- 
Control it was within the SV40P. All isoforms lacked 
out-of-frame upstream AUGs that might impair FLuc trans- 
lation. Transcription from the pMBl ori in pGL3 plasmids 
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can thus result in activation of cryptic 3' ss proximal to the 
FLuc gene to generate FLuc-encoding isoforms. 

To more thoroughly assess the relationship between levels 
of spliced readthrough transcripts and FLuc activity, we 
performed regression analysis on these two variables 
across the full set of pRF and pGL3 constructs tested. A 
very highly significant correlation between these variables 
(Pearson's correlation coefficient, r = 0.9760, P< 0.0001) 
was observed when pGL3-Promoter and pGL3-Control 
were excluded (Figure 6). Not surprisingly, the two latter 
plasmids expressed substantially more FLuc than would 
be predicted from their production of the readthrough 
RNA alone, as both contain the SV40P directly upstream 
of the FLuc gene. Among the other plasmids, however, 
spliced readthrough RNA levels could predict FLuc 
activity with striking consistency. 

Splicing of readthrough transcripts can mimic promoter 
function in the newer pGL4 system 

A reporter plasmid platform designated pGL4 was 
introduced in 2004 as a replacement for the pGL3 system. 
This newer plasmid series included modifications such as 
removal of potential cryptic transcription factor binding 
sites from within the backbone and reporter gene to reduce 
aberrant transcription, and codon optimization to increase 
reporter activity. pGL4 plasmids also contain the pMBl ori, 



and this sequence was left unaltered. We therefore sought to 
determine if similar cryptic transcripts might also generate 
FLuc activity with this system. For this purpose, we 
selected the promoterless pGL4.10 and pGL4.17 
(Supplementary Figure S6B), due to their being the most 
structurally comparable to pGL3-Basic and -Enhancer, re- 
spectively. Introduction of the globin test sequence into the 
MCS resulted in a 7-fold increase in FLuc activity from 
pGL4.10 and a 45-fold increase from pGL4.17 (Figure 7A). 
5' RACE analysis of the FLuc-encoding transcripts from 
pGL4.17 containing the test sequence revealed that, of 41 
transcripts cloned, 30 had initiated from the cryptic 
promoter in the ori (Figure 7B), and two had initiated from 
the SV40P in the plasmid backbone. The initiation site for the 
remaining nine could not be determined as they, remarkably, 
had undergone fra/M-splicing with nine different cellular 
RNAs (Supplementary Figure S7). Thus, pGL4 plasmids 
appear to be as susceptible as pGL3 plasmids to producing 
artifactual reporter activities due to transcripts from the 
pMBl cryptic promoter. 

The pMBl ori and cryptic promoter is ubiquitous among 
cloning vectors 

Because the results of this study were obtained with only a 
subset of available reporter plasmids, we sought to assess 
how common the occurrence of cryptic readthrough 
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transcripts is likely to be among cloning vehicles generally. 
We therefore performed homology searches of GenBank 
to estimate the relative frequency of those with pMBl or 
pUC origins versus those with other replicons, including 
those of ColEl, pi 5a and pSOOl, which are also present 
in a significant proportion of vectors. Of the GenBank 
entries matching one of these origins, 95% were to 
pMBl or pUC, demonstrating the overwhelming predom- 
inance of this element (Supplementary Figure S8A). 
Additionally, the ColEl and pi 5a sequences are both 
highly homologous to the pMBl/pUC origin, particularly 
in the region surrounding and upstream of the transcrip- 
tion start site, and thus may also exhibit promoter 
function in eukaryotic cells (Supplementary Figure S8B). 
In order for the promoter to stimulate expression of a gene 
in the same plasmid, it presumably must be in the same 
orientation. Many commercially available reporter 
plasmids have this characteristic, and include several in 
addition to pGL3 and pGL4 that have been in wide use 
for many years (Supplementary Table S2). Queries of the 
HighWire Press search engine (http://highwire. Stanford, 
edu) using the terms 'luciferase' and either 'pGL3' or 
'pGL4' yielded 10 872 and 917 studies, respectively, 
referencing these two systems alone. 

Cryptic 3' ss are predicted to occur with high frequency 
in candidate regulatory sequences 

Unlike the transcribed portions of genes, which are subject 
to selection pressure against cryptic ss that could cause 
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aberrant splicing, the upstream region where promoters 
and other regulatory elements are located are presumably 
free from such selection. To estimate the chance frequency 
of cryptic 3' ss in these regions of human genes, we 
generated random sequences modeled on these areas (21) 
and screened them with four ss prediction programs for 
potential 3' ss. Predicted 3' ss scoring as high or higher 
than the average of the four authentic 3' ss tested in this 
study (eIF4G, XIAP, E2A and |3-globin) occurred 
0.25-1.0 times per kb (Figure 8). Predicted 3' ss scoring 
as high or higher than those we identified in the parental 
pGL3 plasmids (Figure 5D) occurred approximately 1-10 
times per kb, and those scoring as high or higher than that 
of the estimated average 3' ss in the Database of Aberrant 
3' Splice Sites (DBASS3) (27) occurred 3-9 times per kb. 



DISCUSSION 

Controversy has surrounded the reliability of plasmid tests 
for IRES function (4,8,14,17,29,38,39). Here we show that 
assays for c«-regulatory function in general are prone to 
producing artifactual results when plasmids in widespread 
use for this purpose are employed. We demonstrate that 
candidate regulatory sequences can strongly stimulate 
reporter gene expression from both dicistronic and 
monocistronic vectors by altering the splicing of cryptic 
RNAs constitutively produced from the plasmid ori. For 
test sequences lacking genuine transcriptional or transla- 
tional regulatory activity, the level of stimulation primar- 
ily reflects the level of altered splicing that is triggered. 
These splicing events, which can induce reporter expres- 
sion by up to hundreds-fold, can mimic promoter, 
enhancer or IRES function in the test sequence, depending 
on the RNA species the plasmid used is thought to 
produce. 
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Although the identified cryptic promoter appears to 
be transactivatable by enhancer sequences present else- 
where in the plasmid, it remains active even in the 
absence of such elements. Plasmids pGL3-Basic and 
pGL4.10, which contain no known eukaryotic enhancer 
elements, all produced the cryptic RNA, and introduction 
of 3' ss-containing inserts into both of these vectors 
potently stimulated second-cistron expression via these 
transcripts. In a previous study, we found that introduction 
of the eIF4G or XIAP test sequences into a pGL4-based 
dicistronic vector resulted in very little detectable stimula- 
tion of second-cistron expression. The likely reason for 
this is that the second-cistron reporter in these plasmids 
was GFP, which is a far less sensitive cellular reporter 
than FLuc. 

Earlier studies have noted aberrant reporter gene 
expression caused by backbone sequences of other 
plasmids (40^12), although this problem has been viewed 
as effectively solved by the introduction of termination 
elements upstream of the intended transcriptional unit 
(2). Most commercially available reporter plasmids now 
contain such signals. The upstream terminator in the 
pGL3, pGL4 and pRF plasmids is a bipartite element 
comprising a synthetic polyadenylation (SPA) signal and 
transcriptional pause sequence. The SPA was shown in 
plasmid assays to terminate transcription with high effi- 
ciency when located in an exon (43). When placed within 
an intron, however, the signal was rendered inactive. 
Addition of the pause site downstream of the SPA 
reactivated polyadenylation activity, but only to a 
fraction of that seen when the SPA was within an exon 
(44). These findings may thus explain why this otherwise 



potent signal cannot prevent the production of spliced 
readthrough transcripts: the presence of a 3' ss down- 
stream of the termination element creates an intron 
wherein the element is rendered weak or inactive. A 3' ss 
in a test insert may therefore stimulate reporter expression 
both by causing upstream sequences to be spliced out and 
by facilitating readthrough. 

The ability of the eIF4G, XIAP and other putative IRES 
elements to stimulate expression of the downstream gene in 
dicistronic assays has been previously attributed, fully or in 
part, to mis-splicing of the intended dicistronic RNA 
caused by the 3' ss these sequences harbor (6,7,45). Our 
results, however, indicate that most or all of this ability is 
attributable to splicing of the cryptic transcripts from the 
on, as the presence or absence of the S V40 promoter in pRF 
had almost no effect on either FLuc expression or levels of 
spliced readthrough RNA. Moreover, in the case of ppgal/ 
XIAP/CAT, siRNAs targeted to first-cistron sequences 
retained in the aberrantly spliced dicistronic RNA were 
found to be incapable of reducing CAT expression (17), 
suggesting that second-cistron expression from this 
plasmid as well occurs primarily via the spliced 
readthrough RNA, which is produced whether or not the 
CMV promoter is present. 

A large number of other sequences originally suspected to 
be IRES elements have also proven to stimulate second- 
cistron expression from dicistronic plasmids whether or 
not the dicistronic unit's promoter was present. In each of 
the 28 cases we found in a search of the literature, pRF was 
the plasmid employed and the reporter assay results were 
interpreted as demonstrating cryptic promoter activity in 
the test sequence (13,14,46-60). For only two of these 
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elements (49,56), however, was any attempt made to map the 
transcription initiation site. In both instances, the method 
used was a nuclease protection assay with a probe comple- 
mentary to the insert. This approach would be unable to 
identify initiation sites outside of the probed region. 
However, if a 3' ss within the insert had undergone 
splicing, the analysis would suggest initiation at that 
location. Perhaps significantly, the putative initiation sites 
mapped for one of these two sequences, the MYEOV 5' 
UTR (49), lie adjacent to a predicted 3' ss this sequence 
contains (Supplementary Figure S9). 

The number of inaccurate conclusions that have been 
drawn regarding m-regulatory function due to unexpected 
RNA species such as those described here is difficult to 
estimate but potentially large given the widespread use of 
vectors harboring the pMBl ori. In light of the frequency at 
which cryptic 3' ss are predicted to occur in genomic DNA, 
our findings indicate that the results of reporter tests of 
candidate regulatory sequences should be regarded as 
provisional until unexpected and confounding RNA 
species have been ruled out, or corroborating evidence 
from other sources is obtained. Going forward, the devel- 
opment of reporter plasmids with reduced cryptic 
transcription or improved insulation of the reporter gene 
unit would greatly facilitate the study of civ-regulation. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Tables 1 and 2 and Supplementary 
Figures 1-9. 
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